The goal of the study is to investigate the effects of high-scale urbanization on the body size of Barn Swallows Hirundo rustica gutturalis in China. This study is particularly interesting as it addresses these effects in regards to Swallows of different environmental and climate origins. The design of the experiment was to collect physiological data (body mass and wing length) of breeding individuals from 128 sites at varying stages of urbanization across China. The unit of observation is a singular swallow.
Does urbanization affect body size differently between different sexes?
Independent variables will be (1) Urbanization and (2) Sex. I will fit 2 different models (body mass and wing length as response variables). ANOVA, Regression, Assumption tests (data is probably not normal, so log transformations.) In order to answer my biological question, I need to include an interaction term between urbanization and sex.
# Load packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
library(cowplot)
library(emmeans)
library(DHARMa)
## This is DHARMa 0.4.6. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa')
# Set contrasts
options(contrasts=c("contr.poly","contr.sum"))
# Import the dataset:
# Code to import a dataset
BodySize_data <- read.csv("/Users/winny/Downloads/doi_10-10/Liu_Winson Dataset/BodySize_data.csv", stringsAsFactors=TRUE)
# Does urbanization affect body size [Body mass / Wing length] differently between different sexes?
# Check for missing values
is.na(BodySize_data$sex)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [301] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [313] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [349] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
is.na(BodySize_data$urbanization)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [301] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [313] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [349] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
is.na(BodySize_data$body.mass)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [301] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [313] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [349] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
is.na(BodySize_data$wing.length)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [301] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [313] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [349] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# Fit the model: (body mass)
# Assumptions failed
mod1 = lm(body.mass ~ urbanization * sex, data=BodySize_data)
# Assumptions passed
modlog1 = lm(log(body.mass) ~ urbanization*sex, data=BodySize_data)
# Analysis
# Anova
Anova(modlog1, type=3)
## Anova Table (Type III tests)
##
## Response: log(body.mass)
## Sum Sq Df F value Pr(>F)
## (Intercept) 958.72 1 1.3234e+05 < 2.2e-16 ***
## urbanization 0.30 1 4.0729e+01 5.466e-10 ***
## sex 0.12 1 1.6658e+01 5.533e-05 ***
## urbanization:sex 0.01 1 9.4780e-01 0.3309
## Residuals 2.57 355
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Summary
summary(modlog1)
##
## Call:
## lm(formula = log(body.mass) ~ urbanization * sex, data = BodySize_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.270960 -0.051910 -0.001694 0.059191 0.262592
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8063587 0.0077143 363.787 < 2e-16 ***
## urbanization -0.0007048 0.0001104 -6.382 5.47e-10 ***
## sex.L -0.0445267 0.0109096 -4.081 5.53e-05 ***
## urbanization:sex.L -0.0001521 0.0001562 -0.974 0.331
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08511 on 355 degrees of freedom
## Multiple R-squared: 0.2289, Adjusted R-squared: 0.2224
## F-statistic: 35.14 on 3 and 355 DF, p-value: < 2.2e-16
# Fit the model 2: (wing length)
# Assumptions failed
mod2 = lm(wing.length ~ urbanization * sex, data=BodySize_data)
# Assumptions failed
modlog2 = lm(log(wing.length) ~ urbanization * sex, data=BodySize_data)
modglm2 = glm(wing.length ~ urbanization*sex, data=BodySize_data, family="Gamma")
# Analysis
# Anova
Anova(modglm2, type=3)
## Analysis of Deviance Table (Type III tests)
##
## Response: wing.length
## LR Chisq Df Pr(>Chisq)
## urbanization 27.8007 1 1.345e-07 ***
## sex 6.3275 1 0.01189 *
## urbanization:sex 0.7445 1 0.38822
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Summary
summary(modglm2)
##
## Call:
## glm(formula = wing.length ~ urbanization * sex, family = "Gamma",
## data = BodySize_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.067952 -0.015894 -0.001332 0.015694 0.095466
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.473e-03 1.950e-05 434.468 < 2e-16 ***
## urbanization 1.481e-06 2.808e-07 5.276 2.3e-07 ***
## sex.L -6.935e-05 2.758e-05 -2.514 0.0124 *
## urbanization:sex.L -3.427e-07 3.971e-07 -0.863 0.3888
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.0006413438)
##
## Null deviance: 0.26669 on 358 degrees of freedom
## Residual deviance: 0.22685 on 355 degrees of freedom
## AIC: 1802.5
##
## Number of Fisher Scoring iterations: 3
# 1: Body Mass
# Check assumptions (mod1)
plot(mod1, which=2) # Fails normality assumptions
shapiro.test(mod1$residuals)
##
## Shapiro-Wilk normality test
##
## data: mod1$residuals
## W = 0.99101, p-value = 0.02772
plot(mod1, which=3) # Passes homogeneity of variance assumption (roughly horizontal through with random pattern)
# Assumptions failed
# Check assumptions:
plot(modlog1, which=2) # passes normality assumption
shapiro.test(modlog1$residuals)
##
## Shapiro-Wilk normality test
##
## data: modlog1$residuals
## W = 0.99718, p-value = 0.7962
plot(modlog1, which=3) # passes homogeneity assumption
# Assumptions pass
# 2: Wing Length
# Check assumptions (mod2)
plot(mod2, which=2) # Fails
shapiro.test((mod2$residuals))
##
## Shapiro-Wilk normality test
##
## data: (mod2$residuals)
## W = 0.99052, p-value = 0.02054
plot(mod2, which=3) # Passes homogeneity of variance assumptions
# Assumptions fail
# Normality assumptions (modlog2)
plot(modlog2, which = 2) # Passes normality
shapiro.test(modlog2$residuals)
##
## Shapiro-Wilk normality test
##
## data: modlog2$residuals
## W = 0.99251, p-value = 0.06866
plot(modlog2, which=3) # Passes homogeneity of variance assumption
# Assumptions fails
# Check assumptions (modglm2)
plot(simulateResiduals(modglm2))
# Fails normality assumptions but best-looking diagnostic plots
# ggplot 1 (Body Mass)
plot1 = ggplot(data=mod1, aes(x=urbanization, y=body.mass, color=sex)) +
geom_point() +
geom_smooth(method = "lm", se=TRUE) +
scale_color_manual(values=c("red","blue")) +
theme(plot.title = element_text(hjust=0.5)) + # Center title
theme(title=element_text(size=8)) + # Font size
theme(title=element_text(face="bold")) + # Font bold
ggtitle("Sex and Urbanization on Body Mass") + # Creating title
labs(x="Urbanization (%)", y= "Body Mass (g)", color = "Sex") # Label axis
# ggplot 2 (Wing Length)
plot2 = ggplot(data=modglm2, aes(x=urbanization, y=wing.length, color=sex)) +
geom_point() +
geom_smooth(method = "lm", se=TRUE) +
scale_color_manual(values=c("red","blue")) +
theme(plot.title = element_text(hjust=0.5)) + # Center title
theme(title=element_text(size=8)) + # Font size
theme(title=element_text(face="bold")) + # Bond font
ggtitle("Sex and Urbanization on Wing Length") + # Creates titles
labs(x="Urbanization (%)", y= "Wing Length(mm)", color="Sex") # Label axis
# Side-by-side ggplot
plot_grid(plot1, plot2, labels = "AUTO") # Plot the graphs side-by-side
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
In this analysis, I fitted two separate models as a
proxy for body size of barn swallows with urbanization and sex as
explanatory variables for both models: I wanted to explore whether an
interaction term existed between the explanatory variables. Based on the
graphs made, there is no significant interaction term between
“Urbanization” and “Sex” highlighted by their parallelism in regression
lines. Our Anova function netted that individual effects of A
(Urbanization) and B(Sex) are statistically significant (p<0.05),
while A:B interaction was nonsignificant (p>0.05). This suggests that
our response is dependent on variables A and B effects but not their
joint effects.
Overall, the interpretation of our analysis is that urbanization has an inverse relationship with body size (mass or wing length): Females are bigger in body mass (g) on average compared to males, and males have longer wing lengths(mm) on average compared to females. The statistical significance of urbanization and sex bsuggests there’s statistically significant differences in body size in each group of urbanization and sex, and interaction suggests that each variable contributes to variation in body size but not to the other variable’s effects on body size.
Seeing as that body size declines with increasing urbanization, this could be an adaptation of the swallow to fit in a metropolis. If there are more buildings/structures, then there are less trees/houses that swallows are able to nest on. This is especially more possible given that skyscrapers occupy a good portion of a metropolis at a height unsuitable for nesting so being lighter/smaller is advantageous for living in decorative trees: the Southern region of China is great for farming with its warm climate, , so swallows have a good source of food and home. Thus, maybe they have more resources to direct to development.
The limitations of my interpretations is that wing length data deviates from normality assumptions, so our analysis of the data we’ve interpreted could be flawed. Another caveat of our interpretation is that it doesn’t include certain confounding variables that could be correlated: Bergmann’s rule explains animals adopt larger frames in colder regions.
Connection back to biological question: No, we can establish that there are no effects of sex on urbanization’s effect on body size as there is no interaction term (lines are parallel).
Do swallows North of Beijing have larger body mass compared to swallows South of Beijing? — Controlling for the effects of urbanization
The linear model will simply be selecting for groups of sparrows above and below a certain latitude references around a center point of Beijing. The linear model will have body mass as the response variable and latitude as th explanatory variable while controlling for urbanization.
# Run packages
library(tidyverse)
library(emmeans)
library(car)
library(effectsize)
# Set contrasts
options(contrasts=c("contr.poly", "contr.sum"))
# Import the dataset:
# Code to import a dataset
BodySize_data <- read.csv("/Users/winny/Downloads/doi_10-10/Liu_Winson Dataset/BodySize_data.csv", stringsAsFactors=TRUE)
# Do Swallows North of Beijing have larger body mass than swallows South of Beijing?
# Create latitude variable
BodySize_data$direction = ifelse(BodySize_data$latitude > 39.82022, "North", "South")
# Remove Beijing and associated data
Swallows_direction <- filter(BodySize_data, city != "Beijing")
# Missing value check
is.na(Swallows_direction$direction)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [301] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [313] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# Fit model
mod3 = lm(body.mass ~ direction + urbanization, data=Swallows_direction)
# Refit model (log-transform) - Most normal
modlog3 = lm(log(body.mass)~direction+urbanization, data=Swallows_direction)
# Refit model (GLM)
modglm3 = glm(body.mass~direction+urbanization, data=Swallows_direction, family="Gamma")
# emmeans
emmeans(modlog3, specs="direction")
## direction emmean SE df lower.CL upper.CL
## North 2.753 0.007013 344 2.739 2.767
## South 2.785 0.008323 344 2.768 2.801
##
## Results are given on the log (not the response) scale.
## Confidence level used: 0.95
emm.swallow = emmeans(modlog3, specs="direction") %>% as.data.frame()
# Analysis
# Anova
Anova(modlog3, type=3) # P-value statistically significant
## Anova Table (Type III tests)
##
## Response: log(body.mass)
## Sum Sq Df F value Pr(>F)
## (Intercept) 830.33 1 98351.8568 < 2.2e-16 ***
## direction 0.06 1 7.3457 0.007059 **
## urbanization 0.09 1 10.5566 0.001272 **
## Residuals 2.90 344
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# R^2
summary(modlog3)
##
## Call:
## lm(formula = log(body.mass) ~ direction + urbanization, data = Swallows_direction)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.25506 -0.06628 -0.01066 0.05730 0.30741
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.7942936 0.0089101 313.611 < 2e-16 ***
## direction.L 0.0224021 0.0082656 2.710 0.00706 **
## urbanization -0.0004515 0.0001390 -3.249 0.00127 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09188 on 344 degrees of freedom
## Multiple R-squared: 0.09761, Adjusted R-squared: 0.09237
## F-statistic: 18.61 on 2 and 344 DF, p-value: 2.126e-08
# Effect size
emmeans(modlog3, specs="direction") %>%
eff_size(sigma=sigma(modlog3), edf=df.residual(modlog3))
## contrast effect.size SE df lower.CL upper.CL
## North - South -0.345 0.128 344 -0.596 -0.0932
##
## sigma used for effect sizes: 0.09188
## Confidence level used: 0.95
# eta squared
eta_squared(modlog3, partial=F)
## # Effect Size for ANOVA (Type I)
##
## Parameter | Eta2 | 95% CI
## ----------------------------------
## direction | 0.07 | [0.03, 1.00]
## urbanization | 0.03 | [0.01, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
# partial eta squared
eta_squared(modlog3, partial=T)
## # Effect Size for ANOVA (Type I)
##
## Parameter | Eta2 (partial) | 95% CI
## --------------------------------------------
## direction | 0.07 | [0.03, 1.00]
## urbanization | 0.03 | [0.01, 1.00]
##
## - One-sided CIs: upper bound fixed at [1.00].
# Check assumptions (mod1)
plot(mod3, which=2) # Fails normality assumptions
shapiro.test(mod3$residuals)
##
## Shapiro-Wilk normality test
##
## data: mod3$residuals
## W = 0.96631, p-value = 3.445e-07
plot(mod3, which=3) # Passes homogeneity of variance assumption
# Assumptions fail
# Check assumptions (modlog1) - Go with this one
plot(modlog3, which=2) # Fails normality
shapiro.test(modlog3$residuals)
##
## Shapiro-Wilk normality test
##
## data: modlog3$residuals
## W = 0.98322, p-value = 0.0004551
plot(modlog3, which=3) # Heterogeneity starts wavering
# Assumptions ALSO fail, but it's the best we got.
# Check assumptions (modglm1)
plot(modglm3, which=2) # Fails normality
shapiro.test(modglm3$residuals)
##
## Shapiro-Wilk normality test
##
## data: modglm3$residuals
## W = 0.962, p-value = 7.655e-08
plot(modglm3, which=3) # Passes heterogeneity
# Assumptions fail
# With this, I've exhausted all the possible tests that we've discussed in this course. The most normal lm is log-transformed, so we'll proceed with that.
# ggplot
ggplot() +
geom_jitter(data=Swallows_direction, aes(x=direction, y=body.mass, color=direction)) +
geom_point(data=emm.swallow, aes(x=direction, y=exp(emmean)), color="purple", size=3) +
geom_errorbar(data=emm.swallow, aes(x=direction, ymin=exp(lower.CL), ymax=exp(upper.CL)), width=0.2) +
theme(legend.position = "none") +
theme(plot.title = element_text(hjust=0.5)) + # Center title
theme(title=element_text(size=8)) + # Font size
theme(title=element_text(face="bold")) + # Font bold
ggtitle("Body Size in Swallows: North vs South") + # Creating title
labs(x="Direction", y= "Body Mass (g)")# Label axis
In this analysis, I fit three separate models. None of these models were able to completely pass assumption checks, so I utilized the “most” correct model (modlog2) for my analysis. Connecting back to my biological question, there is a statistically significant difference (p<0.05) in body mass between swallows in the Northern regions vs swallows in the Southern region. However, my biological question was made with Bergmann’s rule in mind; the Southern region actually had a greater average, statistically significant, body mass in grams.
As stated above, my model failed assumptions, so I selected the best-fit model to these assumptions. There is the possibility that I used the wrong GLM.
My figure accurate represents the results of my ANOVA. If you use the eyeball test of confidence intervals, you can see that the confidence intervals do not overlap, suggesting a statistically significant result. If you argue that the confidence intervals do overlap, you’d do the analysis to find the results are signficant: the confidence intervals do not overlap the mean!
Ultimately, my biological question failed, but I managed to get a result that suggested that one region of China had greater body mass (Southern). However, I managed to describe significance, but what is the magnitude of this significance? I ran effect size tests and found Cohen’s D effect size to be -0.345 contrast North - South; eta squared of direction is 7%. These results suggest to me that although our results are statistically significant, there is no practical application to this signficance since our effect size values are so small that it makes no sense in a biological context.
If I were to make a reasoning behind my graphed results despite the low effect size, I would say some variable holds more weight than the effectors in Bergmann’s rule: something in the Southern region of China is causing average body mass to be bigger, despite a warmer climate.
There are some obvious limitations of my interpretation. The first is that since our assumptions failed, then our analysis could be biased. Another is my decision to classify regions of China. I wanted a fair representation to begin with my data, so I made Beijing my center point and everything above and below is filtered as a certain region.
The issue with this classification is that it doesn’t establish the proximity of these cities. Some of these cities are actually part of central China, and some of these cities are port cities. There’s too much diversification, and my degrees of freedom in formatting the data could’ve made my results biased.